Skip to main content

Overview

process_historical_market_breadth.py calculates day-by-day market breadth indicators across the entire stock universe, generating a historical time-series dataset for the Market Breadth Dashboard charts.
Pipeline Position: Phase 4 - Historical analytics generationCritical Function: Powers breadth trend charts with 250 days of advance/decline, SMA breadth, and momentum indicators

Purpose

This script:
  • Processes 250 trading days of historical OHLCV data for all tracked stocks
  • Calculates daily breadth metrics (advances, declines, SMA breadth, etc.)
  • Merges stock breadth with major index price data
  • Outputs a CSV file in a specific row-based format for dashboard consumption

Input Files

all_stocks_fundamental_analysis.json
JSON
required
Master stock list to determine which symbols to process
ohlcv_data/*.csv
CSV
required
Individual stock OHLCV files with columns: Date, Open, High, Low, Close, Volume
indices_ohlcv_data/NIFTY.csv
CSV
required
Nifty 50 OHLCV data used to establish the master timeline (last 250 trading days)
indices_ohlcv_data/*.csv
CSV
required
Index OHLCV files for:
  • NIFTY_MIDCAP_150.csv
  • NIFTY_SMALLCAP_250.csv
  • NIFTY_MIDSMALLCAP_400.csv
  • NIFTY_500.csv

Output Files

market_breadth.csv
CSV
Row-based CSV with each metric as a row and dates as columnsFormat:
Type of Info,2025-05-15,2025-05-16,2025-05-17,...
Up by 4% Today,23,45,12,...
Down by 4% Today,8,15,5,...
5 Day Ratio,1.45,1.52,1.38,...
Above 200MA %,68.5,69.2,70.1,...
Nifty 50,22450.30,22523.15,22601.80,...
market_breadth.json.gz
JSON (gzipped)
Compressed JSON version of the breadth data (currently placeholder in code)

Processing Logic

1. Master Timeline Establishment

Uses Nifty 50’s last 250 trading days as the reference timeline:
LOOKBACK_DAYS = 250

nifty_path = os.path.join(INDEX_OHLCV_DIR, "NIFTY.csv")
nifty_df = pd.read_csv(nifty_path)
timeline = nifty_df['Date'].tail(LOOKBACK_DAYS).tolist()
date_to_idx = {date: i for i, date in enumerate(timeline)}
num_days = len(timeline)

2. Breadth Matrices Initialization

Creates NumPy arrays for efficient metric storage:
# Matrices to store daily flags (Rows=Days, Cols=Stocks)
advances = np.zeros(num_days)
declines = np.zeros(num_days)
above_200ma = np.zeros(num_days)
above_50ma = np.zeros(num_days)
above_20ma = np.zeros(num_days)
above_10ma = np.zeros(num_days)
up_4pc = np.zeros(num_days)
down_4pc = np.zeros(num_days)
high_52w = np.zeros(num_days)
low_52w = np.zeros(num_days)
vol_plus = np.zeros(num_days)
vol_minus = np.zeros(num_days)

3. Stock-Level Processing

For each stock, calculates technical indicators and updates daily counters:
for csv_path in csv_files:
    symbol = os.path.basename(csv_path).replace(".csv", "")
    if symbol not in valid_symbols: continue
    
    # Re-read full history for technicals to avoid edge effects
    full_df = pd.read_csv(csv_path)
    full_df['SMA_10'] = full_df['Close'].rolling(10).mean()
    full_df['SMA_20'] = full_df['Close'].rolling(20).mean()
    full_df['SMA_50'] = full_df['Close'].rolling(50).mean()
    full_df['SMA_200'] = full_df['Close'].rolling(200).mean()
    full_df['Vol_SMA_20'] = full_df['Volume'].rolling(20).mean()
    full_df['H_52W'] = full_df['High'].rolling(252).max()
    full_df['L_52W'] = full_df['Low'].rolling(252).min()
    full_df['Prev_Close'] = full_df['Close'].shift(1)
    full_df['Daily_Ret'] = ((full_df['Close'] - full_df['Prev_Close']) / full_df['Prev_Close']) * 100

    # Filter back to timeline
    analysis_df = full_df[full_df['Date'].isin(timeline)]
    
    for _, row in analysis_df.iterrows():
        idx = date_to_idx.get(row['Date'])
        if idx is None: continue
        
        # Metrics Calculation
        if row['Close'] > row['Prev_Close']: advances[idx] += 1
        if row['Close'] < row['Prev_Close']: declines[idx] += 1
        
        if row['Close'] > row['SMA_200']: above_200ma[idx] += 1
        if row['Close'] > row['SMA_50']: above_50ma[idx] += 1
        if row['Close'] > row['SMA_20']: above_20ma[idx] += 1
        if row['Close'] > row['SMA_10']: above_10ma[idx] += 1
        
        if row['Daily_Ret'] >= 4: up_4pc[idx] += 1
        if row['Daily_Ret'] <= -4: down_4pc[idx] += 1
        
        if row['High'] >= row['H_52W']: high_52w[idx] += 1
        if row['Low'] <= row['L_52W']: low_52w[idx] += 1
        
        if row['Volume'] > row['Vol_SMA_20']: vol_plus[idx] += 1
        else: vol_minus[idx] += 1

4. Advance/Decline Ratio Calculation

Calculates rolling A/D ratios:
def calc_ratio(adv, dec, window):
    r = []
    for i in range(len(adv)):
        start = max(0, i - window + 1)
        sum_adv = sum(adv[start:i+1])
        sum_dec = sum(dec[start:i+1])
        ratio = round(sum_adv / sum_dec, 2) if sum_dec > 0 else 1.0
        r.append(ratio)
    return r

rows.append(to_csv_row("5 Day Ratio", calc_ratio(advances, declines, 5)))
rows.append(to_csv_row("10 Day Ratio", calc_ratio(advances, declines, 10)))

5. CSV Assembly

Assembles the final CSV in row-based format:
rows = []
rows.append("Type of Info," + ",".join(timeline))

# Momentum Indicators
rows.append(to_csv_row("Up by 4% Today", up_4pc.astype(int)))
rows.append(to_csv_row("Down by 4% Today", down_4pc.astype(int)))

# A/D Ratios
rows.append(to_csv_row("5 Day Ratio", calc_ratio(advances, declines, 5)))
rows.append(to_csv_row("10 Day Ratio", calc_ratio(advances, declines, 10)))

# Breadth Percentages
total_tracked = max(processed_count, 1)
rows.append(to_csv_row("Above 200MA %", np.round(above_200ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 50MA %", np.round(above_50ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 20MA %", np.round(above_20ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 10MA %", np.round(above_10ma / total_tracked * 100, 1)))

# 52-Week Extremes
rows.append(to_csv_row("Reached 52w High", high_52w.astype(int)))
rows.append(to_csv_row("Reached 52w Low", low_52w.astype(int)))

# Volume
rows.append(to_csv_row("Volume greater than 20Day Average", vol_plus.astype(int)))
rows.append(to_csv_row("Volume less than 20Day Average", vol_minus.astype(int)))

# Raw Counts
rows.append(to_csv_row("Advances", advances.astype(int)))
rows.append(to_csv_row("Declines", declines.astype(int)))

# Index Prices
for label, prices in index_data.items():
    rows.append(to_csv_row(label, prices))

Output Metrics

Momentum Indicators

Up by 4% Today
integer[]
Daily count of stocks with +4% or greater return
Down by 4% Today
integer[]
Daily count of stocks with -4% or worse return

Advance/Decline Ratios

5 Day Ratio
float[]
5-day rolling advance/decline ratio
  • Values > 1.0 indicate bullish breadth
  • Values < 1.0 indicate bearish breadth
10 Day Ratio
float[]
10-day rolling advance/decline ratio

Moving Average Breadth

Above 200MA %
float[]
Percentage of stocks trading above their 200-day SMA (daily)
Above 50MA %
float[]
Percentage of stocks trading above their 50-day SMA (daily)
Above 20MA %
float[]
Percentage of stocks trading above their 20-day SMA (daily)
Above 10MA %
float[]
Percentage of stocks trading above their 10-day SMA (daily)

52-Week Extremes

Reached 52w High
integer[]
Daily count of stocks hitting new 52-week highs
Reached 52w Low
integer[]
Daily count of stocks hitting new 52-week lows

Volume Metrics

Volume greater than 20Day Average
integer[]
Count of stocks with above-average volume
Volume less than 20Day Average
integer[]
Count of stocks with below-average volume

Index Prices

Nifty 50
float[]
Daily closing prices for Nifty 50
Nifty 500
float[]
Daily closing prices for Nifty 500
Nifty Midcap 150
float[]
Daily closing prices for Nifty Midcap 150
Nifty Smallcap 250
float[]
Daily closing prices for Nifty Smallcap 250
Nifty Midsmallcap 400
float[]
Daily closing prices for Nifty Midsmallcap 400

Usage Example

python process_historical_market_breadth.py
Expected Output:
⏳ Loading master stock list...
Targeting 2847 stocks for historical breadth.
🧬 Processing stock-level history...
✅ Analyzed 2847 stocks. Merging with Index data...
🚀 Market Breadth Historical Data generated: /path/to/market_breadth.csv

Performance Optimization

  • Uses NumPy arrays for memory efficiency with large datasets
  • Processes full history once per stock to calculate technical indicators correctly
  • Filters to timeline only for final analysis to reduce computation
  • Avoids edge effects by using full historical data for rolling calculations

Data Quality Notes

SMA Edge Effects Prevention: The script reads the full historical CSV for each stock to calculate SMAs properly, then filters to the 250-day timeline. This prevents incorrect SMA values at the beginning of the timeline.
Placeholder Metrics: Some metrics like “Up by 25% in Month” and “Nifty 500 % of W&M RSI > 60” are currently placeholders (zeros) and may be implemented in future versions.